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We study the mechanical unfolding of a simple model protein. The Langevin dynamics results 
are analyzed using Markov-model methods which allow to describe completely the configurational 
space of the system. Using transition path theory we also provide a quantitative description of 
the unfolding pathways followed by the system. Our study shows a complex dynamical scenario. 

In particular, we see that the usual one-dimensional picture: free-energy vs end-to-end distance 
representation, gives a misleading description of the process. Unfolding can occur following different 
pathways and configurations which seem to play a central role in one-dimensional pictures are not 
the intermediate states of the unfolding dynamics. 

PACS numbers: 87.15.hm, 87.15.A-, 87.15.H-, 87.15.La 


I. INTRODUCTION 

The characterization of folding and unfolding energy 
landscapes of bionrolecules is a major problem in bio¬ 
physics which sheds light onto biomolecules’ role and 
function 03 - 0 - In this effort, the emergence of single¬ 
molecule techniques that let the manipulation of indi¬ 
vidual molecules has opened a new wide field, allowing 
to monitor unfolding processes by looking into a single 
specimen [SHH]- 

In force-pulling experiments, the one-dimensional de¬ 
scription is usually adopted, as force is considered to im¬ 
pose a preferred direction that appears as the slowest 
degree of freedom compared with the remaining ones. In 
this sense, optical tweezers 00, magnetic tweezers 00 
or afm mm experiments are usually analyzed con¬ 
sidering the end-to-end distance as the proper reaction 
coordinate, with a well developed force spectroscopy the¬ 
ory [R>HIH| that allows stating predictions grounded on 
this hypothesis. Also, recent studies of single molecule 
Foester resonant energy transfer fluorescence study ther¬ 
mal unfolding by tracking the radius of gyration of in¬ 
dividual molecules mm- Computational works simi¬ 
larly take advantage of this simple description, choosing 
reaction coordinates such as the fraction of native con¬ 
tacts Q rnim?| , the RMSD from the native structure |23] 
or the Principal Components pHIETl . Nevertheless, this 
tempting approach must be used with great care, as some 
energy minima which represent relevant metastable con¬ 
formations and the barriers connecting such states may 
be hidden when projecting the actual large-dimensional 
free energy landscape onto a low-dimensional subspace. 
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Besides, one dimensional profiles might suggest mislead¬ 
ing unfolding paths, consequence of this projection re¬ 
striction. 

In order to explore such aspects, we choose a coarse¬ 
grained model protein PM5T1 and study it through a 
force-clamp protocol. The output of the simulations will 
be analyzed through two different approaches, allowing 
a comparison between the conclusions yielded by each. 
First, we build one dimensional free-energy profiles along 
the end-to-end distance and the fraction of native con¬ 
tacts. Second, we describe the configurational space of 
the system by using Markov-Model methods [55HTT| and 
obtain the unfolding paths applying transition-path the¬ 
ory [42H45]- 

Although recent works cast doubt on a simple low 
dimensional description of thermal (un)folding pro¬ 
cesses [21 0B], the one-dimensional approach is usually 
adopted for mechanical unfolding processes, due to the 
privileged direction imposed by the force [Tol ITT] , In 
the case studied here, this fact, together with the sim¬ 
plicity of the protein structure, apparently point to a 
valid one-dimensional description of the unfolding pro¬ 
cess. Nevertheless, we find out that one-dimensional pro¬ 
files lead to deceptive conclusions. In particular, these 
profiles suggest the existence of a metastable state (the 
half-stretched configuration, see Fig. [I]) as a mechani¬ 
cal intermediate between the native and stretched states. 
Opposed to this, we find that unfolding occurs through 
two major routes defined by the existence of two differ¬ 
ent mechanical intermediates, not identified in the one¬ 
dimensional description. Although very stable, the half- 
stretched configuration plays a marginal role in the un¬ 
folding process. This multi-path picture can never be 
captured through a one-dimensional description. In ad¬ 
dition we are able to systematically define all the individ¬ 
ual unfolding pathways calculating their relative weight 
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in the dynamics and yielding a complete and quantita¬ 
tive vision of the protein’s landscape that completes the 
picture described in previous studies on the same sys¬ 
tem 52 ] 155 ] 156]. 


II. MODEL 

The BLN model [28l 122] is a coarse grained off- 
lattice protein model in which the residues are repre¬ 
sented by “colored” beads, hydrophobic (B), hydrophilic 
(L) and neutral (N). Due to its rich behavior, de¬ 
spite its simplicity, this model has been widely stud¬ 
ied, with several modifications through time [5DH551 [551 
155] , In particular, the 46-residue sequence (BLN-46) 
B g N 3 (LB) A N 3 B g N 3 {LB) 5 L folds into a four-strand /3 
barrel showing nonetheless a frustrated ground state [35]. 

The potential terms we use account for a stiff nearest- 
neighbor harmonic potential, a three-body bending inter¬ 
action, a four-body dihedral interaction and a sequence 
dependent Lennard-Jones potential [551136j : 

1 Af_1 

Vbln = -^K 22 ( r i,i +1 - r- 0 ) 2 (1) 

i=i 

N—2 

+ 22 [A cos ®i + B cos 26i — Vo] 

»=1 
N -3 

+ 22 [Ci(l + cos^j) + Di(l + cos3^,)] 

»=1 



where <\ :j is the distance between residues i and j. 6 is the 
bending angle and cf> the dihedral angle. For parameter 
values see [35] and Appendix A. 

We simulate the system by integrating Langevin equa¬ 
tions of motion at constant temperature T and following 
a force-clamp protocol, where monomer 1 is fixed while a 
constant force is applied to the last monomer, 46, through 
a linear spring. Such equations are given by 

mr i = -'yr i ~ViV B LN + F i +r)i, ( 2 ) 

where m is each residue unitary mass, 7 the friction 
coefficient, F the external force applied in the z direc¬ 
tion and r]i Gaussian white noise of zero average, holding 
fluctuation-dissipation theorem (rjir/j) = 2Ty8(t — 

This model protein has a well characterized unfolding 
transition (see [55] and Appendix C) at T c and unfolds 
mechanically at Fjj. We work from now on at T = 0.55 T c 
and F = 0.8 Fjj in order to maximize the number of con¬ 
figurations visited by the system. Lower forces would not 
populate the unfolded state while above Fjj the unfolding 
would be irreversible. 


III. METHODS 

We present here the different methods use to analyze 
the simulated trajectories in order to understand the me¬ 
chanical unfolding scenario of our model system. 

A. Potential of Mean Force 

The Potential of Mean Force (PMF) is a low dimen¬ 
sional (typically one-dimensional) characterization of the 
free energy landscape of a system, which relies on the 
choice of a reaction coordinate X. The PMF is simply 
F/ksT = — logP(A'), where P(X) is the probability 
density of the chosen reaction coordinate X. 

We will explore the PMF of the system (section IV. A) 
by using two different reaction coordinates. As the me¬ 
chanical force imposes a privileged direction, the end- 
to-end distance £ = |rjy — ro| appears as a natural 
choice. This magnitude is indeed widely used in most 
single molecule force spectroscopy applications PH 08]- 
[50]. Additionally, we use the fraction of native contacts Q 
1 19, 22], often reported in computational applications as 
a good magnitude for describing protein unfolding, based 
on the importance of topology on protein structure. 

B. Principal Components Analysis 

Principal Component Analysis (PCA) is a standard 
statistical method for reducing the dimensionality of a 
complex system such as biological molecule [251127] . PCA 
performs a linear transformation by diagonalizing the co- 
variance matrix Cij = {yiDj) — ( yi){yj )> removing thus all 
internal correlations. The Principal Components (PCs) 
qi are calculated as the projection of the trajectory onto 
each eigenspace. If we order the eigenvalues, the first 
largest PCs contain most of the fluctuations of the sys¬ 
tem and can be used as adequate reaction coordinates. 

C. Conformational Markov Network 

In order to characterize the thermodynamical and 
kinetic properties of our system we build a Markov 
Model [55] [52] by discretizing the state space of our 
molecule into a set S = {1,2, •• • , M} of M conforma¬ 
tional states defining the Conformational Markov Net¬ 
work of the system mm- For our system, the confor¬ 
mational space is defined as the first three PCs, reducing 
greatly its dimensionality but keeping its essential fea¬ 
tures. With these three coordinates we maintain the 75% 
of the system fluctuations, while the remaining ones ac¬ 
count for symmetric thermal fluctuations. Each of the 
coordinates is discretized into 30 bins of equal volume, 
thus M = 27000. 

The Conformational Markov Network is built from the 
dynamical trajectories, by counting the occupation of 
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each of the states 7 q and calculating the transition matrix 
Tij which measures the probability of going from state i 
to state j within time r, being r the time window or lag 
time used to analyze our trajectories (r = 15ps in our 
case). 

The transition matrix T is ergodic and, if the molecule 
is in equilibrium, the occupation distribution 7 can be 
recovered as the eigenvector with eigenvalue 1. In such 
situation, detail balance condition holds, 7 qZjj = 7r jTji , 
and 7 T is the Boltzmann distribution. 


D. Basins of attraction Network 

As the Conformational Markov Network is typically 
made up of thousands of nodes and links, hardly any 
relevant physical information can be directly obtained. A 
clustering or coarse-graining process is usually followed 
in order to group together nodes with similar physical 
features leading to an smaller, more meaningful network. 

Here we apply the Stochastic Steepest Descent algo¬ 
rithm m (see Appendix B.2 for detailed algorithm). 
The advantage of this algorithm is that the network 
is systematically split into its basins of attraction i.e. 
groups of nodes whose probability flux converges into 
a single node (minimum). The coarse-graining process 
does not rely in any arbitrary definition, but on the ki¬ 
netic properties of the system. Physically, while each 
node would represent microstates of the system, the 
basins of attraction represent macrostates. 

Onto this network we calculate a new transition matrix 
and the occupation probability of each basins 7 Tj. Free 
energy differences from basin i and j are given by A Fij = 
—kBTlogTTi/TTj. The mean escape time from basin i is 
defined as (t s ) = r/( 1 — Ta ), where r is the time window 
used to sample the configurations, while transition times 
between basins i and j are defined as Tj_>j = r /Tij. 


E. Transition-Path Theory 

The Markov Network defined above contains all ther¬ 
modynamic and kinetic information of the system. Nev¬ 
ertheless, we are interested in computing the transition 
pathways between the set of native conformations to 
the fully stretched conformation. Transition-Path the¬ 
ory provides the necessary tools for doing this jT?ll' 11 j . 
We define A as the subset of basins which represent the 
native conformation while B is the subset of stretched 
basins. Our question is which is the typical sequence of 
intermediate I states to go from A to B. 

The committor probability is defined as the proba¬ 
bility, when starting at state i, to reach set B next rather 
than A. In our case, this is the unfolding probability. By 
definition qf = 0 if i € A and qf = 1 if * S B. Math¬ 
ematically, the committor probability can be computed 


by solving the following system of linear equations: 

-«, + +EW = -E r «- ( 3 ) 

fee/ fees 

For a molecule in equilibrium, the backward- 
committor probability q~ is simply q~ = 1 — qf. 

The transition matrix Tjj contains information from 
every possible trajectory which appears in the equilib¬ 
rium ensemble of the molecule. In order to extract the 
contributions from the unfolding trajectories A B, we 
calculate the effective flux /, 7 defined as the probability 
flux from i —> j contributing to the A —► B transition: 

fij - ~,q, 1 1 j , lj ■ (4) 

If we want to calculate the unfolding flux, removing 
recrossings which might appear in a A —> B transition, 
we need to define the net flux as 

fij = max[0, fij — fji\, (5) 

fL defines a network of fluxes that go from A to B. The 
total unfolding flux F represents the expected number of 
A -A B transitions per time window r and is defined as: 

f = J2J2 ■ ( 6 ) 

i&A j^A 

In order to decompose this flux network onto individual 
pathways Pi, different approaches can be applied HUBS] . 
Here we base our strategy on the bottleneck algorithm, 
where given an individual pathway, the bottleneck (rate 
limiting step) is identified as the minimal net flux of the 
path fi and subtracted from every remaining net flux 
ffj. The process is iterated until the network is fully 
decomposed into a set of individual pathways Pi. 

IV. RESULTS 

In order to elucidate the unfolding mechanism under 
the effect of mechanical force for our model protein, we 
have performed six long equilibrium simulations. Every 
simulation starts from the native configuration, is equili¬ 
brated for 3 fis and then runs up to 3 ms. 

A. One dimensional description: the Potential of 
Mean Force 

Figure [l] shows the PMF calculated along the end-to- 
end distance £ and the fraction of native contacts Q of our 
model protein. The profile for £ shows four clear minima 
that can be identified with four different configurations, 
considering that each of the (3 strands has a length of 
£ ~ 3 nm. In the native configuration ( N) £ ~ 0 nm, as 
the extremal /3 strands are oriented in the same direction. 
In the aligned configuration ( Al ) the second strand (LB )4 
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is bent so that the extremes are aligned in the pulling di¬ 
rection and £ ~ 3 nm. The half-stretched configuration 
( HS ) appears as an stable minima at £ ~ 6 nm, as the 
fourth (LB );5 strand is unfolded. The fully stretched con¬ 
figuration (S), with £ ~ 12 nm, shows the protein totally 
unfolded, as an stretched polymer. 

These states can also be identified in the Q profile. 
State S has all contacts broken Q ~ 0, while Al and HS 
maintain around half of the contacts (Q ~ 0.5). The N 
configuration shows a minimum at Q ~ 0.75, as thermal 
fluctuations break on average some of the contacts. 




Q 

FIG. 1. Potential of Mean Force as a function of the end-to-end 
distance £ and the fraction of native contacts Q. 

Remarkably, for this value of the force, the HS config¬ 
uration correspond to the lowest minimum in both free 
energy profiles, and thus is the most stable configura¬ 
tion. Its position in the PMF suggests that it also has 
a relevant role in the stretching pathways, appearing as 
a clear mechanical intermediate between the native and 
fully stretched configuration. In addition, it is necessary 
to jump over a barrier of several ksT to reach state S 
while the other states are separated by low barrier. This 
suggest a fast dynamics between N and HS and longer 
time scales to visit state S. 


B. Two dimensional description: Principal 
Component Analysis 

Before describing the Markov Model of the system, it is 
worth to exploit further the information PCA provides. 
As explained previously, we build the Markov network 
by discretizing the first three PCs, which define our con¬ 
formational space, with lower dimensionality, but still 
capturing the main aspects of the system dynamics. 



-60 -40 -20 0 20 

First Principal Component 


FIG. 2. Free-energy landscape along the first two PCs. 

Figure [2] shows the free-energy landscape along the first 
two principal components AF/k/jT = — log P(qi, < 72 )- Its 
basic features agree with the one dimensional landscapes 
shown in previous section, as three major wells are found. 
Nevertheless we see also clear differences, being the PCs 
able to capture better the details of the free-energy land¬ 
scape. Each of these major wells have a rough struc¬ 
ture, showing a set of minor wells separated by small en¬ 
ergy barriers ~ 2revealing thus a richer variety of 
configurations. Moreover, two new low populated wells 
appear between the folded structures (native and half- 
stretched) and the fully-stretched configurations. These 
new states could suggest the existence of different un¬ 
folding pathways, where the half-stretched configuration 
does not necessarily plays the role of mechanical inter¬ 
mediate. 


C. Equilibrium ensemble of the model protein: the 
Basin Network 

The built microstate network is made up of 1876 nodes 
related kinetically through 23995 links. After applying 
the Stochastic Steepest Descent algorithm [41], the net¬ 
work is clustered into 30 basins connected through 1290 
links. In order to obtain a good description of the sys¬ 
tem, we keep only those basins which were visited at least 
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FIG. 3. Basins of attraction Markov Network (Upper). We rep¬ 
resent the 13 basins with n > 10“ 5 where the size of the bead 
is proportional to 7Tj. The bidirectional arrows connecting nodes 
represent allowed transitions (the magnitude of Tij is not shown). 
Each basin is labelled according to the configuration they encode. 
Representative structure associated to each basin (Lower). 

0 .001% of the trajectory (7r, t > 10 -5 ), avoiding patholog¬ 
ical or extremely rare states. After this refinements, we 
keep 13 macrostates, connected through 65 edges, includ¬ 
ing auto-links. 

Figure [3] (upper) shows a graphical representation of 
the basin network, where the size of each bead (node) is 
proportional to its occupation 7r,> The spatial arrange¬ 
ment of the nodes was calculated applying the Force Atlas 
algorithm EP, where an artificial dynamics is simulated. 
This dynamics is based in considering each link as a linear 
spring and including a certain repulsion between nodes, 
until an equilibrium configuration is obtained. The nodes 
are colored according to the modularity class they be¬ 


long to [55], having five different classes. Lower panel 
of Fig. [3] shows a representative structure of each basin 
(macrostate), including the label which identifies them. 

Configurations Ni and IV 2 correspond to native-like 
states and will define the native set A due to its structural 
similarity and high Q value. The aligned configuration 
Al, already identified in Fig. [l] appears close to Ni and 
N 2 in Fig. [3] but does not belong to the native set since 
it gives very different Q and £ values. Basin HS is the 
Half-Stretched Configuration, the most stable macrostate 
under these conditions. State S is the Fully-Stretched 
Configuration, while the remaining 8 basins are labelled 
as intermediate states and will be discussed further on. 


TABLE I. Description of the basins of attraction. 


# 


(t s ) [ps] 

(Q) 

(0 H 

qf 

Nx 

0.15 

559 

0.75 

0.8 

0.0 

n 2 

0.14 

495 

0.73 

0.9 

0.0 

Al 

0.14 

272 

0.40 

2.6 

1.4 x 10" 4 

HS 

0.44 

2982 

0.46 

6.5 

9.2 x 10" 4 

h 

0.07 

362 

0.25 

4.8 

1.2 x 10" 3 

h 

0.01 

2586 

0.35 

6.8 

0.12 

h 

6.67 x 10" 5 

120 

0.12 

9.0 

0.29 

h 

1.3 x 10~ 4 

198 

0.11 

10.1 

0.34 

h 

1.9 x 10~ 5 

64 

0.10 

9.6 

0.51 

h 

3.9 x 10~ 4 

163 

0.14 

8.55 

0.53 

h 

3.3 x 10~ 4 

176 

0.13 

9.35 

0.58 

h 

2.5 x 10- 5 

56 

0.09 

10.5 

0.71 

S 

0.06 

75000 

0.01 

13.7 

1 


Table U shows information about each of the identified 
macrostates, n * is the occupation of basin i, (t s ) the 
mean escape time (defined above), (Q) the mean fraction 
of native contacts and (£) the mean end-to-end distance, 
both calculated from the marginal distributions of such 
magnitudes on each basin. It is remarkable that in many 
cases such distributions are not unimodal, so the actual 
meaning of the average must be taken with care. Finally, 
qf are the committor probabilities from the native (N\ 
and N2) to the stretched (S) configuration this is: the 
unfolding probability of basin i. 

It is important to stress the difference between the two 
native basins Ni and IV 2 , as they have very different 
connectivity features in the network, belonging to dif¬ 
ferent modularity classes. Configuration Ni is closer to 
the native structure, given the arrangement of the neu¬ 
tral turns, while IV 2 shows bigger fluctuations, leading 
to a loss of some contacts. Interestingly, N\ is more 
connected to the Intermediate States than N%, which 
shows fast transition times to HS, tn^hs = 557ps, 
while tnx^hs = 13-5 x 10 6 ps. In fact, they are both 
scarcely connected -tn 2 ^.n 1 = 14 x 10 3 ps and tn 1 ^.n 2 = 
15 x 10 3 ps-, reason why they belong to a different mod¬ 
ularity class. In this regard, in spite its structural simi¬ 
larity which overlap both states in the PMF description, 
their actual role in the configurational space is quite dif¬ 
ferent. 
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In this sense, the first contradictions with the conclu¬ 
sions yielded by the PMF description appear here. While 
both descriptions agree coarsely in the main features of 
the equilibrium ensemble of the system, revealing three 
major states (native, half-stretched and fully-stretched), 
the role of such states and the presence of other relevant 
configurations is hidden in the one-dimensional projec¬ 
tion. Ni and (V 2 states are integrated into the same high 
Q or low £ minimum, will the intermediate low-populated 
states which connect to the stretched state are impossible 
to be identified in the one-dimensional representation. 


D. The unfolding pathways: Transition Path 
Theory 

In order to decipher the actual unfolding mechanism of 
our model protein under the effect of a mechanical force, 
we apply Transition Path theory to the basin network, 
as explained in Methods section. 

We define the native set A as basins Ni and iV 2 , while 
the stretched set B is just made up of basin S. Ac¬ 
cording to this definitions, we calculate the committor 
probabilities, shown in Table [T| Figure [4] shows the net 
flux network, being the thickness of the arrows propor¬ 
tional to the net flux /A. The total unfolding flux is 
F = 2.9 x 10 _ 7 ps _1 , meaning that we observe an unfold¬ 
ing transition every 3.5/xs, approximately. 

We decompose the net flux network by identifying first 
the strongest pathway, remove it from the network and 
repeat the process until there is no path from set A to 
set B. Due to the size of our network, this process can be 
done manually, although computational applications can 
be used [H| 35] ■ We identify a total of 9 different paths 
leading from A to B. After decomposing the network 
into these 9 paths, unconnected regions still remain due 
to the presence of trap states [13] that carry around 20 % 
of the flux. Figure [5] shows the 6 more relevant paths, 
which carry 89% of the unfolding flux. 

From the 9 pathways, 7 start from conformation Ni 
while just 2 from iV 2 . This is a remarkable fact, being 
Ni closer to the native structure than 7V 2 , as discussed 
in previous section. In addition, states I\ and / 2 appear 
as the actual intermediates for the unfolding mechanism: 
A —> B is forbidden in case these two states are removed 
from the net flux network. Out of the 9 pathways, 6 of 
them pass through state J 2 and 3 through state I\. 

The construction of the Markov Model from the PCs 
and the use of Transition Path Theory help us to unveil 
the actual unfolding mechanism and its driving process. 
While HS is a notably relevant metastable state (indeed 
the most stable state under these conditions), its role in 
the unfolding mechanism is completely marginal, as just 
appears in path P 5 , with a weight of 7%. This impor¬ 
tant conclusion contradicts those derived from the one¬ 
dimensional description showed in Fig. [l] where HS was 
suggested as the mechanical intermediate of the unfold¬ 
ing mechanism. The actual mechanical intermediates are 



FIG. 4. Folding flux for the model protein. The network de¬ 
picts the 85% most relevant unfolding pathways for the 46 — mer 
BLN model protein Each of the 13 configuration identified with 
the Stochastic Steepest Descent algorithm are shown here, together 
with the label which identifies them. The configurations are ar¬ 
ranged vertically according to their committor probability (not in 
scale). The arrows connecting configurations represent the unfold¬ 
ing net flux /4, with their thickness is proportional to the mag¬ 
nitude of the flux. The numbers next to the arrows give the flux 
magnitude in 10~ 9 ps —1 . 

Ii and / 2 (not identified in the one-dimensional descrip¬ 
tion), defining the two major unfolding routes. / 2 has 
a similar structure to HS, but while HS maintains the 
hydrophobic core, in HS the extremal Bg strand is un¬ 
folded, breaking the core that stabilizes the structure and 
driving the unfolding mechanism. On the other hand, I\ 
is more stable tt Ii = 0.07 and represents a modified HS 
structure where the folded branches collapse into a globu- 
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Native (A) 



II lit 

\ \ \ \ \ \ 


Pjt 25% P 2 :20% P 3 :15% P 4 :13% P 5 :7% P g : 7% 

Stretched (B) 

FIG. 5. Model protein unfolding pathways. The six pathways 
carrying most of the total flux (up to 89%) are explicitly shown. 

lar structure which might lead to expose the extremal Bg 
branch to the solvent and drive the unfolding mechanism 
through states I 4 and 1$. 


V. CONCLUSIONS AND DISCUSSION 

In this paper we have presented the detailed analysis of 
the unfolding process of a model protein under the pres¬ 
ence of a mechanical pulling force. This scenario mim¬ 
ics force clamp single molecule experiments, where pro¬ 
teins or nucleic acids are subject to a constant external 
force that drives their unfolding. Due to the limitation 
of available observables, these experiments are often ana¬ 
lyzed by reconstructing their free-energy landscape along 
the pulling direction through different existing techniques 
[IMSl|48H50j. This approach is often followed in many 
computational studies by using different reaction coordi¬ 
nates [HH53]. 

In this sense we wanted to reproduce a similar protocol 
and explore the conclusions yielded by a one-dimensional 
analysis and a multidimensional Markov model approach. 
The simplicity of our model protein, and the fact that 
the force sets a privileged direction invites to a one¬ 
dimensional characterization. Nonetheless, we have seen 
how both approaches lead to contradictory conclusions. 


The PMF description shows the existence of three ma¬ 
jor states, the native, the stretched or denatured and a 
metastable Half-Stretched configuration which seems to 
play the role of mechanical intermediate due to its posi¬ 
tion in the free-energy profile. 

Nonetheless, a more detailed multidimensional study 
changes dramatically the unfolding picture. Being the 
most populate one, HS state plays a marginal role in 
the unfolding pathway, with just 7% of the unfolding 
flux passing through it. The true mechanical interme¬ 
diates are states I\ and I 2 , building the two major un¬ 
folding routes, both related to the loss of the hydropho¬ 
bic core that destabilizes the structure and drives the 
unfolding process. In this sense, due to the existence 
of multiple pathways, independently of the chosen reac¬ 
tion coordinate, a one-dimensional picture would never 
be enough to characterize the unfolding pathway of this 
system. Thus, our work differs from those which put 
attention on the proper choice of the reaction coordi¬ 
nate pa 07]. The necessity of multidimensional descrip¬ 
tions indeed has been warned in the last years to under¬ 
stand thermal unfolding, where the protein transits from 
a low-entropy state (native) to a high-entropy one (dena¬ 
tured) l2j] iH. The one-dimensional picture, however, is 
vastly assumed in mechanical unfolding processes, both 
in experimental and computational applications. 

Regarding our analysis Markov Model protocol, we 
stress two major differences when compared to most 
works of this community. First, it is important to note 
that we are actually using the PCs as reaction coordi¬ 
nates in order to reduce the system dimensionality. Nev¬ 
ertheless, these coordinates has been proven to capture 
successfully the most relevant dynamical events of com¬ 
plex systems such as biomolecules. In our case, three 
coordinates are enough, as the remaining ones account 
merely for gaussian thermal fluctuations. Second, we 
stress on the importance of the coarse-graining mecha¬ 
nism applied to the original Conformational Markov Net¬ 
work [TT] , which is able to systematically cluster the net¬ 
work based only on the kinetic properties of the system. 

Although extremely simple molecular assays such as 
DNA or RNA hairpins could fit into a single reaction co¬ 
ordinate description [48], increasing slightly the complex¬ 
ity of the molecule leads to a dramatical rise in the com¬ 
plexity of the actual free energy landscape in the system, 
requiring more detailed studies. In this sense, molecules 
such as multiple nucleic-acid hairpins |53| , protein-ligand 
complexes [53] or any mechanically pulled protein [53], 
appear as potential systems where a one-dimensional de¬ 
scription takes the risk of leading to a clear misunder¬ 
standing of the actual complexity of their conformational 
space and the dynamical processes to which they are sub¬ 
ject. 
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Appendix A: Model parameters and simulation 
protocol 

We simulate our system using the following adimen- 
sional parameters in Eq. (1),: 

• Vi: K = 50, r 0 = 1. 

• V 2 : A = 5.118, B = 5.308, V 0 = -5.295 

• V 3 ' Ci = 0 and Di = 0.2 if two or more aminoacids 
are neutral, and Ci = Di = 1.2 otherwise. 

• V 4 : there are three different cases, according to the 
character of the aminoacids. 

1. Cij = 0 and = 4 if i or j are neutral. 

2. = 1 and e^- = 4 if i and j are hydrophobic. 

3. = — 1 and e = 8/3 in the remaining cases. 

All simulations were carried out using self-built code, in¬ 
tegrating the overdamped Langevin equations described 
above with an stochastic second order Runge-Kutta al¬ 
gorithm [5B] , 

Physical units can be easily recovered in the follow¬ 
ing way. Length unit is defined by the C a — C a dis¬ 
tance r 0 = 0.38 nm. Energy units are defined as the 
energy of a hydrogen bond €h ~ 1.7/cbT, being force 

units F « 17.3pN. Mass unit is that of an average 

aminoacid m a sa 3 x 10~ 22 kg. In this sense our time 
units t = \/m a r^l ch ~ 3 ps, and the damping is that of 
water 7 « 10 

Six trajectories at F = 0.8 Fjj were simulated (with 
Fj ss 20 pN), were monomer 1 was kept fixed while force 
was exerted to monomer N through a linear spring. Each 
simulation covered a total time of 3 ms, with a previous 
thermalization process of 3 ps. The integration step is 
dt = 0.005f and the time window to sample the trajecto¬ 
ries r = 5 t. 


Appendix B 

1. Conformational Markov Network 

The Conformational Markov Network (CMN) [40l |4T] 
appears as a useful coarse-grained representation of large 
stochastic trajectories. This picture is obtained by dis¬ 
cretizing the conformational space explored by the sys¬ 
tem and considering the dynamical jumps between the 
discretized configurations along the simulation. In this 
sense, the nodes of the complex network are defined by 


the discretized states, while the links account for the ob¬ 
served transitions between them. The arising network is 
thus a weighted and directed graph. 

In our case, the conformational space is defined by the 
three first principal components, in order to reduce the 
number of degrees of freedom, keeping indeed the es¬ 
sential features of our system. We divide each of the 
principal component into 30 cells of equal volume. Our 
discretized conformational space is thus made up of 30 3 
posible states, which may be or not occupied within the 
stochastic trajectory. We assign each node a weight 77 
accounting for the fraction of trajectory that the sys¬ 
tem has visited within the trajectory. The normalization 
condition )T/ 77 = 1 holds. Secondly, the value T l3 is 
assigned to each directional link accounting for the dy¬ 
namical jumps from node j to i. Self-loops can exist, 
and thus Tn ^ 0. Finally the normalization condition 
J2i Tij = 1 is forced. According to this, the CMN is to¬ 
tally defined by the occupancy vector II = Pi and the 
transition matrix T = {Tij}. The matrix T is the tran¬ 
sition probability of the Markov chain defined by: 

n(f + At) = TTI(t), (Bl) 

where 11(f) it the probability distribution at time t. If the 
trajectory is long enough, T is ergodic and time invari¬ 
ant, vector II coincides with the stationary distribution 
associated with the Markov chain II = Til. Morover, the 
detailed balance condition must hold: 

Tji-rri = Tjj-Kj. (B2) 

2. Stochastic Steepest Descent 

Once we have translated de molecular dynamics tra¬ 
jectories onto a CMN, we apply the stochastic steepest 
descent (SSD) algorithm [4Tj in order to split it into 
its basins of attraction in an efficient way, obtaining 
in turn useful thermo-statistical information about the 
system. The SSD algorithm is inspired in the determinis¬ 
tic steepest descent algorithm used to find minima in a 
multidimensional surface. We define the assisting vector 
U = {iq}, where i labels the nodes. The steps of the 
SSD algorithm are the following: 

1. We start with U = 0. 

2. Select randomly a node l with ui = 0 and write an 
auxiliary list of nodes adding l as first entry. 

3. Select within the neighbors of l the node m that fol¬ 
lows the maximum probability flux, this is T m i = 
max{ Tjiyj^i}. Check which of the following con¬ 
ditions is fulfilled: 

(a) If T m i > Ti m and u m = 0, add m to the list 
and go back to 3. using m instead of l. 
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(b) If T m i > Ti m and u m ^ 0 write the labels of 
all the nodes in the list as Uj = u m . Go back 
to step 3. 

(c) If T m i < Tim remove link l —> m from the 
graph. Return to point 3. 

This process ends when every node in the CMN has 
been labelled, this is Ui ^ 0Vi. Then, the whole confor¬ 
mational space has been characterized and every node is 
connected with its local minima in the FEL. All nodes 
with the same label belong to the same basin in this FEL 
and therefore we can associate them with the same con¬ 
formational state. 

Given the basin partition, a new CMN network can 
be built, taken the basins themselves as new nodes. 
The occupation probabilities will now be defined as 
Tr a = 7Tj, while the new transition matrix T is 

built, with elements Tp a = E , e /3 T ji^i/ Ei £Q n i- 

From these definitions, transition times can be easily 
calculated as t a p = r/Tp a , being r the time window 
used for the network construction. The relative free 
energy of basin a with respect to basin is simply 
A F a = — k B T \og(jt a / it p) . 


Appendix C: Thermal and mechanical 
characterization 

We start by characterizing the protein from a ther¬ 
mal and mechanical point of view, in order to know the 
suitable range of force and temperature to work with. Al¬ 
though more detailed characterizations have been made 
in previous works |36| we focus on the thermodynamical 
transition at T c , reflected on a peak in the heat capac¬ 
ity, as it can be seen in Fig. [6l The heat capacity is 
calculated as C p = (fcsT) _2 [(£ 1 2 3 4 5 6 7 8 9 ) — {E} 2 ], with E the 
total internal energy. We work at T — 0.55T C , below the 
transition, but with allowing enough fluctuation for the 
system to explore its configurational space. 


When applying force to the protein, it exhibits also a 
transition at F B , where the protein unfolds mechanically 
to the fully stretched configuration. At this force, the 
end-to-end distance £ increases abruptly, while the frac¬ 
tion of native contacts Q drops to 0. Around F = 0.75 Fjj 
a first change of behavior can be seen, due to the popu¬ 
lation of the half-stretched configuration, which leads to 

a drop to Q ~ 0.5 and £ ~ Inm. 
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FIG. 6. Thermal and mechanical characterization of the model 
protein. At T c it exhibits a thermodynamical unfolding transition, 
reflected in a peak on the heat capacity (in arbitrary units). Force 
also induces an unfolding transition at Fjj, leading to the fully 
stretched conformation. 
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